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Abstract 

The monotone duality problem is defined as follows: Given two monotone formulas / and 
g in nonredundant DNF, decide whether / and g are dual. This problem is the same as duality 
testing for hypergraphs, that is, checking whether a hypergraph T~L consists of precisely all minimal 
transversals of a simple hypergraph Q. By exploiting a recent problem-decomposition method by 
Boros and Makino (ICALP 2009), we show that duality testing for hypergraphs, and thus for 
monotone DNFs, is feasible in DSPACE[ log 2 n], i.e., in quadratic logspace. As the monotone 
duality problem is equivalent to a number of problems in the areas of databases, data mining, and 
knowledge discovery, the results presented here yield new complexity results for those problems, 
too. For example, it follows from our results that whenever for a Boolean-valued relation (whose 
attributes represent items), a number of maximal frequent item sets and a number of minimal 
infrequent item sets are known, then it can be decided in quadratic logspace whether there exist 
additional frequent or infrequent set. 

Keywords: Duality testing, frequent item set, hypergraph, transveral, data mining. 

1 Introduction 

This paper derives new complexity bounds for the problem Dual of deciding whether two irredun- 
dant monotone Boolean formulas in DNF are mutually dual, or, equivalently, of deciding whether two 
simple hypergraphs are dual, i.e., whether each of these hypergraphs consists precisely of the minimal 
transversals of the other. While the exact complexity remains open, there is progress: We prove a 
DSPACE[log 2 n] upper bound for Dual, and and another, presumably tighter bound that is expressed 
in terms of sophisticated machine-bounded complexity clases. The Dual problem is actually one 
of the most mysterious problems in theoretical computer science. It is has many applications, espe- 
cially in the database, data mining, and knowledge discovery areas (5l|6l[T9l, some of which will be 
mentioned below. Let us first describe the Dual problem more formally. 

Duality testing for monotone DNFs and hypergraphs. Two Boolean formulas f(x±,X2, ■ ■ ■ , x n ) 

and g(x±,X2, ■ ■ ■ , x n ) on propositional variables x\, X2, ■ ■ ■ , x n are dual if 

f(xi,x 2 , ...,x n ) = ^g(-^xi, -1X2, . . . , ^x n ). 

A monotone DNF is irredundant if the set of variables in none of its disjuncts is covered by the 
variable set of any other disjunct. The duality testing problem DUALis the problem of testing whether 
two irredundant monotone DNFs / and g are dual. 

A hypergraph % is a finite family of finite sets (also called hyperedges) defined over some set of 
vertices V{%). By default, if V(%) is not explicitly speeded, the set of vertices of 7~L is U^e-H E- A 
transversal of % is a subset of V{%) that meets all hyperedges of 1~L, and a minimal transversal of 
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% is a transversal of % that does not contain any other transversal as subset. The set of all minimal 
transversals of a hypergraph % is denoted by tr{T~L). The Hypergraph Duality Problem is the problem 
of deciding for two simple hypergraphs Q and % whether Q = tr(H). In case Q ^ tr(H), to witness 
this, one may want to exhibit a new transversal of H with respect to Q. This is a transversal of % 
that has no hyperedge of Q as subset. Obviously, every new transversal H contains at least one new 
minimal transversal of Hw.r.t. Q, but it needs not to be minimal itself. 

It is well-known that DNF duality and hypergraph duality are actually the same problem (see |5l). 
In fact, two irredundant monotone DNFs / and g are dual iff their hypergraphs are dual. The hyper- 
graph associated to a monotone DNF has precisely one hyperedge for each disjunct, consisting of the 
set of all variables of this disjunct. Vice versa, one can trivially associate an irredundant DNF to each 
hypergraph and thus reduce hypergraph duality to DNF duality. Given that these problems essentially 
coincide (and can be reduced to each other via trivial reductions that are much easier than logspace 
reductions), we regard them as one and the same problem, which we refer to as DUAL. 

The duality problem in data mining, database theory, and knowledge discovery. Most promi- 
nently, the Dual problem is at the core of a number of important data mining and database problems. 
It is central, for example, to the determination of the maximal frequent and minimal infrequent sets 
in data mining. More precisely, consider a Boolean-valued data relation M over a set S of attributes 
called items, and a threshold z with < z < \M\. Each subset U C S is called an itemset. For 
each tuple t of M, let items(T) = {A G S | t[A] = 1}. The frequency f(U) for an itemset U 
is the number of tuples t of M, such that U C items (t). U is frequent if f(U) > z and infrequent 
otherwise. In data mining, one is interested in computing the maximal frequent sets and the minimal 
infrequent sets (under set inclusion) for M and z. Let us refer to the former as IS + (M,z) and to 
the latter as IS~(M, z). Clearly, both S + (M, z) and S~(M, z) are simple hypergraphs over S, and 
we abbreviate them by IS + and IS~ , respectively, when M and z are understood. As a fundamental 
result towards the aim of computing IS + and IS~ , it was shown in |[T9l that the minimal frequent 
itemsets are exactly the minimal transversals of the complements of the maximal infrequent itemsets, 
i.e. IS' = tr(IS+), and thus also IS + = tr(IS~), where for A C 2 s , A = {S - A\A G S}. 

Let MaxFreq-MinInfreq-Identification be the following decision problem in data mining: 
Given M, z, a set Q C IS~(M, z), and a set H C IS + (M, z), decide whether H = IS+(M, z) and 
Q = IS~(M, z), that is, whether there exists no additional maximal frequent or minimal infrequent 
itemsets for M and z, that is not already in Q U %. In |[T9ll it was shown that that there exist no such 
additional itemset itemsets iff Q = tr{%) With regard to the computational complexity, we thus have: 

Proposition 1.1 ( [19]). MaxFreq-MinInfreq-Identification is logspace-equivalent to Dual. 

The results of [ 19], are at the base of a host of algorithms for maximal frequent itemset generation, 
that compute both IS + and IS~ incrementally. These algorithms initialize Q and % with some easy to 
compute subsets of IS' and IS + , respectively. Then, at each step they check whether for the current 
sets Q = tr{%) is true, and if not, compute one or more new transversals from which new maximal 
frequent itemsets or minimal infrequent itemsets can be computed easily (see e.g. ffilZZl)- Thus, not 
only the decision problem Dual is of relevance to data mining, but also the problem of effectively 
computing a new transversal that acts a witness that Q ^ tr(H). In the present paper, we will obtain 
results on the complexity of this latter problem, too. 

Another interesting related database problem is the ADDITIONAL KEY FOR INSTANCE problem 
for explicitly given relational instances. Given a relational instance R over attribute set S, and a set 
K of minimal keys for R, determine if there exists a key for R that is not already contained in K. 
This problem, which has been shown equivalent to Dual in the early nineties Q, may be of renewed 
interest in our times of Big Data, where we are faced with massive data tables. 

Proposition 1.2 ( [5]). The ADDITIONAL KEY FOR INSTANCE problem is logspace equivalent to 
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DUAL. Moreover, enumerating the minimal keys of a relational instance R is equivalent to enumerat- 
ing the set tr(H)for some hypergraph % which is logspace-computable from R. 

Other related problems equivalent to Dual or to Dual deal with the construction of Armstrong 
relations for sets of functional dependencies [5 ], see also |[T7l l4l. 

Let us finally briefly mention a completely different problem from the area of distributed databases. 
For quorum-based updates Il24ll in distributed databases, the concept of coterie, which is essentially a 
hypergraph of intersecting quorums has been introduced, and one is specifically interested in so called 
non-dominated coteries (for definitions and details, see irT2ll20lO . The following was proven: 

Proposition 1.3 (|20l|5l). A coterie % is non-dominated ifftr{%) = %. 

There are a large number of applications of the Dual problem and of hypergraph dualization in 
the areas of knowledge discovery, machine learning, and more generally in AI and knowledge rep- 
resentation. Just to mention a few: Learning monotone Boolean CNFs and DNFs with membership 
queries |fT9l , model-based diagnosis IT261 H"8ll . computing a Horn approximation to a non-Horn the- 
ory [ 22l[T5l . and computing minimal abductive explanations to observations (H. Surveys of these and 
other applications and further references can be found in 013 • 

Known complexity results. The exact complexity of Dual has remained an open problem. Fred- 
man and Khachiyan ifTTTl have shown that DUAL is in DTIME[n°( lo s n )], more precisely, that it is 
contained in DTIME[n 4x ( n ) + °( 1 )], where xi n ) i s defind by x( n ) n ^ = n - Eiter, Gottlob, and 
Makino [7], and independently, Kavvadias and Stavropoulos ll23l have shown that DUAL is in the 
complexity class co-^P, which means that showing that the complement of DUALcan be solved in 
polynomial time with 0(log 2 n) nondeterministic bits. This small amount of nondeterminism can 
actually be improved to 0(x(n) logn) which is o(log 2 n), see Q. 

Research question tackled The question about the space-efficiency of Dual, namely, whether 
Dual can be solved space-efficiently has not been satisfactorily answered since it was posed several 
times since 1995, for example in ||5][28j|9]]- We believe that this is an important question which may 
actually turn out to be of practical relevance when mining large data sets with terabytes of data. This 
is the main problem we tackle. In addition, we aim at obtaining a better understanding of the DUAL 
problem in terms of machine-based structural complexity. 

Results. We show in this paper that Dual is indeed in DSPACE[ log 2 n], which is a very 
low class in POLYLOGSPACE. From this, modulo the widely believed assumption that PTIME 
<2 POLYLOGSPACE we thus obtain satisfactory evidence that DuALis not PTIME-hard, which an- 
swers another complexity question posed in O |9]|. Our results are based on a careful analysis of a 
recent problem decomposition Method by Boros and Makino [2]. Their decomposition method actu- 
ally yields a parallel algorithm that solves Dual on an EREW PRAM in 0(log 2 ) time using n 5 
processors. However, it is currently not known whether such EREW PRAMS can be simulated in 
DSPACE[ log 2 n], and this is actually considered to be rather unlikely. However, Boros' and Makino's 
algorithm does not seem to exploit the full potential of a PRAM, and by taking into account the re- 
stricted pattern of information flow imposed by the specific self-reductions used in their algorithm, we 
succeeded to show membership of DUAL in DSPACE[ log 2 n]. 

Complexity theorists have very good reasons to assume that the space class D S PACE [ log 2 n] 
is incomparable with respect to containment to the class co-^P It is thus somewhat unsatisfactory 
to have two upper bounds for Dual that are incomparable, which suggests that, most likely, there 
exist better bounds. This encouraged us to look for a tighter upper bound for Dual in terms of 
machine-based complexity models, that would be contained in both DSPACE[ log 2 n] and co-^P, 
and we succeeded to find one. We can, in fact, show that DUAL belongs to the "guess and check" 
class GC(log 2 n, [LOGSPACE po i] log ). This somewhat exotic new machine-based complexity class 
contains precisely all problems that can be solved by first guessing 0(log 2 n) bits and then checking 
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the correctness of this guess in [LOGSPACE po i] log , which is a complexity class contained in PTIME 
we will define in the present paper. We hope that this tighter new bound will provide a better insight 
into the very nature of the Dual problem, and possibly hint at the right direction for future research 
towards finding a matching upper bound. 

Roadmap. The paper is organised as follows. In the next section we discuss decomposition 
methods for Dual and give a succinct description of the method of Boros and Makino, which we 
consider to be the currently most advanced method. In Section 3, we define complexity classes based 
on iterated self-compositions of functions and prove a useful complexity-theoretic lemma. In Section 
4, we use this lemma to prove our main result, namely that DUAL is in DSPACE[log 2 n]. Finally, 
in section 5 we provide our tighter structural complexity bound for Dual. The paper is concluded in 
Section 6, where we also exhibit a diagram (see Fig. 1 on page 11) that puts all relevant complexity 
classes in relation, and highlights the new upper bounds. 

2 The Decomposition Method by Boros and Makino 

Most algorithms for deciding Dual rely on decompositions that start with an original DUAL instance 
and recursively transform it into a conjunction of smaller instances, until each instance is either seen 
to be a no-instance because it violates necessary conditions for duality, or until it is small and effi- 
ciently decidable. Such decompositions are also known as self-reductions, see, e.g., Section 5.3 of Q. 
The decomposition process corresponds in the obvious way to a decomposition tree. Different decom- 
position methods give rise to decomposition trees of different shapes and depths. For example, the 
well-known algorithm A by Fredman and Khachiyan [11] produces a "skinny" binary decomposition 
tree of depth linear in the input volume \Q\ x \1-L\, while their algorithm B produces a non-binary tree 
of similar depth, but with fewer nodes. Later, decomposition methods giving rise to trees of polylog- 
arithmic depth were published. In particular, the methods of Kavvadias and Stavropoulos ll23l as well 
as the two methods by Elbassioni in ifTOl give rise to decomposition trees of polylogarithmic depth. 
Finally, decomposition methods yielding trees of logarithmic depth were presented by Gaur [13] (see 
also Gaur and Krishnamurti lfl4l0 . and, more recently, by Boros and Makino [2J. As we will show, the 
logarithmic-depth decomposition trees generated by these methods can be used to show that DUAL is 
in DSPACE[log 2 ri\. In particular, we use the elegant decomposition method of Boros and Makino [O 
to prove this, but we could have used Gaur's [13] in a similar fashion. In the rest of this section, we 
give a succinct description of the method of Boros and Makino, that contains all the essentials we need 
for our subsequent complexity analysis. It is assumed that the input instance / = (0,71) we have 
\TL\ < \G\, and that Q C tr{l~L) and T-L C tr{Q). Clearly this can be tested in logarithmic space. 

For an input instance / = (Q, T-L) of Dual over a vertex set V, let T(Q, T-L) denote its decomposi- 
tion tree. Let tt w = N° U N 1 U N 2 U • • • U NL 1o § I^U , where N are the natural numbers and N° stands 
for the sequence 0. Thus N% contains all sequences of natural numbers of length up to [log |%|J + 1. 

Each node of T(Q, T~L) has five data structures associated with it: 

(i) A unique label label(a) consisting of a sequence a G In particular, the root ao of T(Q, T-L) is 

labeled by 0, and the i-th child of a node labeled . . . is labeled (ji, . . . , j^, i). 

(ii) A asetS a C V{Q). 

(iii) An instance of DUAL inst(a) = {Q Sa ,'Hs a )^ 

where Q^ a = {E n S a \ E E G] and n Sa = {E e n \ E Q S a },. 

(iv) A marking mark{a) € {done, fail, nil}, where each leaf of the decomposition tree will be 

marked with DONE or FAIL, and each non-leaf is marked with dummy value NIL. Intuitively, 
each leaf marked DONE identifies a branch that does not contradict 7~L = tr(Q), whereas a leaf 
marked FAIL identifies a branch that proves that T-L ^ tr(Q). 
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(v) A set of vertices t(a) C V{Q). This set will be the empty set for each node not marked FAIL, and, 
in case a is marked FAIL, will contain a witness for % ^ tr(Q) in form of a new transversal of 
Q with respect to Tl. 

Let us now describe the method for building T(Q, TV) and deciding whether % = tr(Q) in detail. 
At each stage of the algorithm, let us denote the set of current leave nodes by A. Here is how the tree 
is built. The input instance (Q, H) is first transformed into a initial tree consists of the root ao with 
label(ao) = 0, S ao = V, inst(ao) = {0,%), mark(ao) =NIL, and t(ao) = 0. At each stage of the 
decomposition, first, each leaf a G A where \H$ a \ < 1, will be marked by the following procedure, 
and will then not be further expanded and will thus be a leaf of the final tree T(Q, H): 

PROCEDURE MARKSMALL(a): 
CASE 1. IF \H Sa \ = and G" Q Sa , THEN { mark(a) :=FAIL; t(a) := S a }. 
CASE 2. If \H S J = and G Q Sa , THEN { mark{a) :=DONE; t(a) =0}. 
CASE 3. If H Sa = {H} and {{i}\i G H} C Q Sa , THEN { mark{a) :=DONE; t(a) =0}. 

CASE 4. OTHERWISE let mark(a) :=FAIL, and let t(a) := S a — {i} for some arbitrarily chosen 
i G H with {i} Q Sa . 

Then, each leaf a of A not yet marked is subjected to the following procedure: 

PROCEDURE PROCESS(a): 

1. Let I a consist of those vertices of T-Ls a that occur in more than \Hs a 1/2 hyperedges of Hs a \ 

2. IF I a is a new transversal of Q Sa with respect to l~Ls a , THEN 

{ mark(a) := FAIL; t(a) := (V - S a ) U I a ; EXIT PROCEDURE}; 

3. OTHERWISE IF there is a G G <5 Sq such that G n J a = THEN let 

C = {S a - (E - {i})\E G Gc a audi G En G}, 
where g^ a = G Sa - {E' G G Sa \ E' C S a — G}; 

4. OTHERWISE IF there exists a H G -ff5 a such that H Q I a THEN let 

C = {S r a -{t}|*€lT}U{#}; 

5. Let = \C\ and the elements of C be Ci, C*2, . . . , C K i a y For each Cj, 1 < i < create 
a new child a, with label (ai) = (label (a), i), 

S ai = Ci, inst(a>i) = (Q Sa ^1is ax ), mark(ai) =NIL, and £(a») = 0. 

Exhaustively apply the procedures MARKSMALL (to unmarked leaves a having \Hs a \ < 1) and 
PROCESS (to all other unmarked leaves), until there are no unmarked leaves left in the tree. The 
resulting tree is then T(Q, %). The following proposition summarizes results by Boros and Makino |2l . 

Proposition 2.1 (Boros and Makino Ell). 

1. % = tr(Q) iff all leaves ofT(Q, %) are marked DONE. 

2. The depth ofT(Q,7i) is bounded by log [H\. 

3. Each node a ofT(Q,%) has at most \V\ ■ \G\ children, i.e., n(a) < \ V\ ■ \G\, where V is the set 
of vertices ofQ and %. 

4. In case T-l ^ tr(Q), t(a) is a new transversal ofQ with respect to Ti. 
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3 A Complexity-Theoretic Lemma 



For numerical function^, we denote by DSPACE[z(n)] (FDSPACE[z(n)]) the class of all all deci- 
sion problems (computation problems) solvable deterministically in 0(z(n)) space. For a function /, 
let f 1 = f and for i > 1, let f l+l = f o f \ where o is the usual function composition, i.e., where for 
each x in the domain of g, (/ o g){x) = f(g(x)). Let Q denote the set of all functions computable in 
space 0(log 2 n) from strings over an input alphabet to the non-negative natural numbers,and let 

Qlog denote that subclass of Q containing all functions p, where for each input string I, p(I) is 
0(log|7|). For each function p G Q, let f p denote the function that to each input I associate the 
output f p (I) = f p<yI \l). If FC denotes a functional complexity class, then [FC] log denotes the class 
of functions that can be built from some function / in FC via a logarithmic number p(I) + O(logn) 
of self-compositions of / for each input of size n: 

[FC] lo s = |J {/"}. 
/eFC,peQiog 

For a functional complexity class FC, the subclass FC po i is given by all functions / of FC for which 
there exists a polynomial 7 such that for each input I, and for each 1 < i, |/*(/)| < 7(|7|)- Note 
that for many classes FC, FC po i is a proper subclass of FC. This is, for example, the case for 
FD SPACE [log n], i.e., functional logspace. For instance, let / be the function that associates to an 
input of size n an output consisting of n 2 zeroes. Clearly, / G FD S PACE [log n], but the output sizes 
of the P are not bounded by any fixed polynomial when i varies. 

Lemma 3.1. [FD SPACE [log n] pol ] lo s C FDSPACE[log 2 n]. 

Proof. The proof is similar the well-known proof that for any two functions /, g e FDSPACE[log n], 
their composition g o f is in FD S PACE [log n], too, see, e.g. l|25ll ). However, here, the logarithmic 
(rather than constant) number of compositions is responsible for the blowup of the required space by 
a logarithmic factor. Let / be a function from strings to strings in FDSPACE[logn] po i, realized by 
a logspace Turing Machine T, and let p G Qlog- In order to prove the lemma, it is sufficient to show 
that one can construct a single functional Turing machine T* with space bound 0(log 2 n), that for 
each input I of length n, simulates the pipelined application T p ^ that outputs f p ^ (I). T* simulates 
an arrangement of p(I) copies of T, say, T\, T2, . . . , T p m, such that the input string v\ to T\ is I, 
and such that for i > 1, the input string Vi+i to Tj + i is equal to the output string Wi of Tj. Given 
that the size of Wi = T l (I) is bounded by some fixed polynomial 7, there are numbers a and b such 
that each T, requires no more than space a + b log n. When simulating the pipelined computation 
T p ij\ (Tp(n_i(' • • (^2(^1 (/)))) on a single Turing machine T* , we have to avoid the effective storage 
of any intermediate output wi (or, equivalently, input To this aim, T* simulates each Tj via a 

logsoace procedure Pj that maintains its own space area on the worktape of T*. Each Pj acts like Tj, 
except for the following modifications: For 1 < i < p(I), Pi has a single output bit which is stored 
on the worktape of T*; moreover Pj takes as input a dedicated special index register di that specifies 
which output bit of T\ is to be computed, and computes only this output bit (suppressing all other 
output bits) and stores it in a single-bit register Oj. Tj's access to its j-th input bit is then simulated 
by Pi writing "j" (in binary) into the special index register starting Pj_i, and then waiting until 
Pi-i writes the desired output bit into Oj_i which corresponds to the correct value of the j'-fh output 
of Ti-x, and thus the j-th input bit to Tj. Pi and P p m work in a similar way, except that Pi directly 
accesses the input string I from the input tape of T* , and P p m , rather than suppressing some output 
bits, writes all output bits to the output tape of T*. 

The workspace required by each procedure Pj is easily seen to be bounded by a' -\-b' log n for some 
fixed constants a' and b' independent of n. This reflects the a + b log n bits required to execute Tj, plus 

1 We only consider time and space constructible functions here. 
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the little extra space Pj may require for its index dj, for the output bit Oj, and for a constant number of 
auxiliary counters and pointers (of size at most a + b log n bits each) for control and stack management 
for the Pi procedures. Given that p(I) is 0(log n), T* requires 0(log 2 n) space in total. □ 

Note that the same space bound doesn't hold for [FDSPACE[logra]] log . In fact, with functions 
/ in this class, intermediate outputs / l (J) may be of superpolynomial size, and in the worst case, 
even of exponential size n®( n \ Therefore, when omitting the "pol" restriction, the best space bound 
we are able to show is [FD SPACE [log n] ] log C PSPACE. Since an FD SPACE [log 2 n] Turing ma- 
chine has an output of size at most n°^ ogn \ it is actually the case that [FD SPACE [log n]] log <2 
FD SPACE [log 2 n]. 

4 The Space Bound for Dual 

The main result of this section is that for a pair (Q,7i), the entire decomposition tree T(Q,7i) (with 
all markings and labels) produced by the decomposition method of Boros and Makino as outlined in 
Section [2] can be computed with quadratic logspace. The other space-complexity results follow from 
this as simple corollaries. 

We start with a lemma that provides us with a logarithmic space bound for computing the i- 
th child of a node a of the decomposition tree from the fully labeled node a and from the set V 
of vertices of the original input instance, or for discovering that such a child does not exist. If 
a is a node of the decomposition tree, let us denote by attr(a) the attributes of a, i.e., the tuple 
(label(a), S a , inst(a), mark(a),t(a)). 

Lemma 4.1. There is a derministic logspace procedure NEXT(V r , attr(a),i), which for each DUAL 
instance (G, T~L) over vertex set V, for each attribute set attr(a) of a node a ofT(G, %), and for each 
positive integer i < \ V\ ■ \G\ outputs: 

• attr(cti) ifoti is the i-th child of a in T(G, T~L); 

• IMPOSSIBLE otherwise (i.e., if a has less than i children). 

Proof. First note that by simple inspection it is immediate that the procedures MARKSMALL and PRO- 
CESS given in in Section|2]can be implemented by deterministic logspace transducers. In fact, these 
procedures only perform simple cardinality checks, counting, assignments, and set theoretic opera- 
tions that are all well-known to run in logspace. A procedure NEXT, as required, can be constructed 
as follows. If label(a) G {done,FAIl} then output IMPOSSIBLE, otherwise perform the composition 
MARKSMALL* (PROCESS* (a)), where: 

• PROCESS* works like PROCESS except that it outputs only the i-th child of a, if such a child 
exists, rather than outputting all children, and output IMPOSSIBLE otherwise; and 

• MARKSMALL* works like MARKSMALL, except that it also accepts the input IMPOSSIBLE, in 
which case it also outputs IMPOSSIBLE. 

These minor modifications of MARKSMALL and PROCESS clearly run in deterministic logspace, there- 
fore, also their composition does, and hence so does the procedure NEXT. □ 

A path descriptor for a DUAL instance / = over a vertex set V is a list of length < 

[log |%| J, whose elements are integers bounded by \V\ ■ \G\. The set of all path descriptors for / 
is denoted by PD(I). Clearly, PD(I) C and each label label(a) of a node a of T(G,H) is 
contained in PD(I). Intuitively, a path descriptor, exactly in the same way as a label, is intended to 
describe a sequence of child-indices, that, starting from the roof of T(G, H) lead to a specific node a 
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of T(G, PL). The root of T(Q, PL) is identified by the empty path descriptor. If it = (i\, 12, ■ ■ ■ , i r ) is a 
path descriptor, then head(-7r) = i\ and tail(ir) is the path descriptor (£2, ... , i r ). Two path descriptors 
of the form (ii, . . . , i r ) and . . . , i r , i r +i) are said to be consecutive. 

Lemma 4.2. There is a procedure PATHNODE(I, tt) that runs in deterministic space 0(log 2 (|/|)), that 
for each DUAL input instance I and path descriptor tt £ PD{I) outputs attr (a) if it corresponds to 
the label label (a) of a node in T(Q, PL), and outputs WRONGPATH otherwise. 

Proof. Let I = (Q,PL), V = V(Q), and 7r G PD{I), and let £(tt) denote the length of the sequence 
7T (recall that (£(ir) < log |/|). The procedure PATHNODE first computes in deterministic logspace 
Attr(ao) for the root «o of T(Q, PL). It then computes f 1 ^ (V, attr(ao), it), where / is the function 
corresponding to the procedure F described as follows. F accepts as input either the string WRONG- 
PATH, or a triple (W, attr, 7) where W is a set, attr is a data structure of the same format as the 
attributes attr(f3) of some vertex /3 in a decomposition tree, and tt is a sequence of positive integers. 
On all other inputs, F outputs the empty string. On input WRONGPATH, F outputs WRONGPATH; 
otherwise F computes F'(next(W, attr, head{^)), where NEXT be as specified in Lemma 
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and 

where F' is as follows. F' outputs WRONGPATH if next(W, attr, head(j)) = IMPOSSIBLE, and F' 
outputs (W, Attr', tail{^)), whenever next(W, attr, head{^)) = Attr 1 for some attribute descrip- 
tion Attr' . Since NEXT runs in deterministic logspace, also F' and F do, and therefore / is a logspace 
computable function. 

By construction and by Lemma |4~Tj PATHNODE precisely computes the attributes Attr(a) if there 
is a node a with label (a) = n in T (Q,% ), whereas otherwise PATHNODE outputs IMPOSSIBLE. Since 



l(n) is clearly in Q\ og , by Lemma 3.1 f e ^ w \V, attr (ao), it) can be computed in deterministic space 
0(log 2 n), and so can therefore PATHNODE. □ 

By using a procedure PATHNODE according to the above Lemma, we are now ready to formulate 
an algorithm DECOMPOSE that computes the decomposition tree T(Q, PL) to a DUAL instance (Q, PL). 
In particular, the algorithms first lists the vertices and then the edges of the tree T(Q, PL). 

Algorithm decompose: 

Input: DuAL-instance I = (Q, PI); Output: T(Q, PL). 
BEGIN 

OUTPUT(" Vertices :") ; 

FOR each path descriptor tt G PD{I) DO 

IF PATHNODE(7, 7f) ^WRONGPATH THEN OUTPUT( PATHN0DE(I, 7t) ); 
OUTPUT("Edges:"); 

FOR each pair n, tt' of consecutive path descriptors n, tt' in PD(I) DO 
BEGIN 

a := pathnode(I, 7r); 
a' := PATHN0DE(7, it'); 

if a' ^WRONGPATH THEN OUTPUT( (label (a), label(a')) ); 
END 
END. 

Theorem 4.1. The Algorithm DECOMPOSE computes the decomposition tree T(Q,PL) to a DUAL 
instance (Q, PL) in space 0(log 2 n). 

Proof. The correctness of the algorithm follows from the correctness of PATHNODE as shown in 



Lemma 4.2 For the space bound, note that each each path descriptor requires only 0(log |7|) = 
0(log 2 n) bits, and that we can thus iterate (by re-using workspace) over all path descriptors and pairs 



of path descriptors in 0(log n) space. Given that, by Lemma 4.2 PATHNODE also runs in 0(log n 



space, the entire DECOMPOSE algorithm needs only 0(log 2 n) space. □ 
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Corollary 4.1. 

1. Deciding Dual is in DSPACE[log 2 n]. 

2. lftr(Q) %, then computing a new transversal ofQ with respect to % is in FDSPACE[log 2 n\. 

Proof. In both cases we can first compute the entire decomposition tree T(Q, %) in FDSPACE[log 2 n], 
and then (i) for problem 1 check by a DLOGSPACE procedure whether all leaves are marked DONE, 
and (ii) for problem 2, use an FLOGSPACE procedure to find a node a labeled FAIL in T(Q, Ti) and 
output its component t{a). Let o denote the composition operator for complexity classes in the obvious 
sense. Given that FDSPACE[log 2 n]oDLOGSPACE =DS PACE [log 2 n], and given that, moreover, 
FDSPACE[log 2 n]oFLOGSPACE =FDSPACE[log 2 n], the complexity bounds follow. Alternatively, 
we can solve the problems 1 and 2 directly by respective slight modifications of DECOMPOSE. □ 

Note that if tr(Q) / T~L, the witness t(a) produced is not necessarily a minimal transversal of Q, 
but is, in general, just transversal of Q that contains no edge of Q and thus witnesses that tr(Q) ^ H, 
because t(a) must contain a missing minimal transversal of Q. From tr(a) such a minimal transversal 
t can easily be computed in polynomial time by letting first t := t{a) and by then successively elim- 
inating vertices v from t for which t — {v} is still a transversal of Q. However this process requires 
linear space in the vertex set V to remember the eliminated vertices plus logarithmic space in the in- 
stance size | ((/,"%) | for checking. This is still better than polynomial space in the full instance size, 
but is not quite in quadratic logspace. It is currently not clear whether there exists a smarter algorithm 
that requires quadratic logspace only. 

5 Tightening the complexity bound 

By the results of the previous section, Dual and its complement Dual are in quadratic logspace, i.e., 
in DSPACE[log 2 n]. On the other hand, as already mentioned, the complement of DUAL is in /^P the 
class of problems solvable in polynomial time with 0(log 2 n) nondeterministic guesses. This class 
is identical with the class GC(log 2 n, P) of the so called Guess and Check model of limited nonde- 
terminism EdU, where 0(log 2 n) nondeterministic bits are guessed before the proper computation 
starts and are appended to the input. 

Given that the class P is generally believed to be incomparable with DSPACE[log 2 n] (cf ED), 
and given that PC GC(log 2 n, P), it is very likely that also GC(log 2 n, P) and DSPACE[log 2 n] are in- 
comparable. Since Dual belongs to both classes, this suggests that neither well characterizes DUAL, 
and that Dual is unlikely to be complete for either. This observation incited us to look out for a com- 
plexity class containing Dual that would be contained in both GC(log 2 (n), P) and DSPACE[log 2 n], 
and that would thus constitute a tighter upper complexity bound for Dual than all those we have seen 
so far. In this section, we present precisely such a complexity class. In order to describe this class, we 
state some definitions and prove a lemma. 

Let C be a complexity class and s a numerical function. Then GC(s(n), C) is the class of all 
languages L for which there exists a language A G C such that an input string / is in L iff there 
is a string J of 0(s(\I\)) bits, such that (/, J) is in A. In other words, L is in GC(s(n), C) iff the 
membership of a string I in L can be checked in C after having guessed 0(s(n)) nondeterministic bits 
that can be used as an additional input. The class [LOGSPACE po i] log is defined as the composition of 
[FDSPACE[logn] po i] los with LOGSPACE=DSPACElogn, formally: 

[LOGSPACE po i] lo s = [FD SPACE [log n] po il log ° LOGSPACE. 
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Thus, an input / is first transformed to an output O by a [FDSPACElog n pol ]] log procedure, after which 
O is submitted to a LOGSPACE decision procedure which will decide based on O if the original input 
/ is accepted or rejected^] 

Lemma 5.1. Given a Dual instance I = (Q, H) and a path descriptor ir G PD(I), deciding whether 
PATHNODEfl, tt) outputs a leaf ofT(Q, %) whose mark-component is FAIL is in [LOGSPACE po i] log . 



Proof. The proof of Lemma 4.2 already shows that PATHNODE is in [FDSPACE[logn]] log . Deciding 
whether PATHNODE(T, n) outputs a leaf of T{Q, %) whose mark-component is FAIL can thus be im- 
plemented by first executing PATHNODE(I, tt), and then checking whether the output is a node labeled 
FAIL. This is obviously in [FDSPACE[log ra]] log o LOGSPACE = [LOGSPACE pol ] log . □ 

Finally, we study the main class of this section: GC(log 2 n, [LOGSPACE po i] log ). 

Theorem 5.1. Dual g GC(log 2 n, [LOGSPACE po i] log ). 

Proof. In order to find a new transversal t of Q with respect to % for a DUAL instance / = (Q,1-L), 
rather than computing the entire decomposition tree T{Q,%), it is sufficient to guess a branch of 
this tree that terminates in a leaf a labeled FAIL, and then compute t(a). Guessing such a branch 
amounts to guess a path descriptor tt and then checking that PATHNODE(I, n) outputs a node marked 
FAIL. Guessing tt amounts to guess log 2 n bits, and this is all our guess-and-check algorithm guesses. 



Checking that tt is a FAIL node is, by Lemma 5. 1 in [LOGSPACE po iJ log , hence the overall computation 
can be done within GC(log 2 n, [LOGSPACE pol ] log ). □ 

The last theorem of this section shows, as promised, that GC(log 2 n, [LOGSPACE po i] log ) is ef- 
fectively a subclass of the other tightest lower bounds that are most likly incomparable to each other: 
DSPACE[log 2 n] and GC(log 2 n, P) = /3 2 P. 

Theorem 5.2. GC(log 2 n, |LOGSPACE pol ] log ) C DSPACE [log 2 n] n GC(log 2 n, P). 

Proof. For the inclusion GC(log 2 ra, [LOGSPACE pol ] log ) C DSPACE [log 2 n], note that a decision 
procedure in GC(log 2 n, [LOGSPACE pol ] log ) amounts to (i) guessing O (log 2 re) bits, which can be 
simulated by an exhaustive enumeration of all possible guesses (under re-use of space), which is feasi- 
ble in DSPACE [log 2 n], and (ii) for each such simulated guess, performing a check that lies in the com- 
plexity class [FDSPACE[logn] po i] log oLOGSPACE. Since, by Lemma 1, [FD SPACE [log n] pol ] log C 
DSPACE [log 2 n], and given that the composition of a function in DSPACE [log 2 n] with a LOGSPACE 
computation yields a DSPACE [log 2 n] computation, the overall computation is in DSPACE [log 2 n]. 

To establish the inclusion GC(log 2 n, [LOGSPACE po i] log ) C GC(log 2 n, P), it is sufficient to see 
that [LOGSPACE pol ] log C PTIME. This is the case. A decision procedure in [LOGSPACE pol ] log 
amounts to a pipelined execution of O(logra) instantiations of a logspace function /, where the inter- 
mediate results are guaranteed to be of polynomial size in the original input, followed by the applica- 
tion of a logspace Boolean procedure g. This can be replaced by the pipelined execution of O(logn) 
instances ofa of a PTIME procedure equivalent to /, followed by the application of a Boolean PTIME 
procedure equivalent to g. In total, this latter process is in PTIME because it amounts to a logarith- 
mic number of invocations of a PTIME procedure, where each time the input size is bounded by a 
polynomial in the size n of the overall input. Therefore, [LOGSPACE pol ] log C PTIME. □ 



2 Note that [LOGSPACE po i] log is by all means a complexity class defined in terms of machines and ressource bounds. 
In addition to the classical ressources such as the amount of workspace, we here involve somewhat more unusual resources 
such as the allowed number of self-compositions, which is here bounded by O(logn), whence the superscript log, and the 
allowed size of intermediate outputs in compositions, which is here polynomially bounded, whence the subscript pol. 
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6 Summary, discussion, and conclusion 



DSPACE[log 2 n] 



In this paper we have derived new complexity bounds for the Dual (or Dual) problem that show 
that these problems can, in principle, be implemented by space-efficient algorithms. These bounds 
are depicted in Figure 1 in relation to the other relevant complexity classes. Here, set-inclusion 
is visualized by ascending lines or paths. We believe that our results represent some progress in 
the long and rather tortuous battle towards a better understanding of the mysterious Dual problem. 
We do not claim that our results have 

immediate practical consequences, but pspace 
we actually do hope that these bounds 
will prove useful. Firstly, the 
0(log 2 n) space bound indicates that 
there exist space-efficient algorithms, 
and this encourages us to look for 
practical space efficient solution meth- 
ods for Dual and its equivalent 
problems in data mining and in the 
database area. We feel that space- 
efficiency may be an advantage, when 
dealing with big data stemming from 
financial transactions or biological ex- 
periments and so on. When mining Figure 1 
terabytes of data, we might want to logspace 
trade workspace (i.e., main memory) 
for runtime. Our results state that this 
is, in principle, feasible. Future re- 
search will show, whether our space bound can be exploited to come up with a reasonable algorithm. 
Secondly, our results may serve as a guide for research towards a matching bound for the Dual prob- 
lem. We have reasons not to believe that Dual is hard for co-GC(log 2 n, [LOGSPACE pol ] lo s). This 
upper bound, however, gives us some intuition of where to dig further. 

Acknowledgments: The author is grateful to E. Allender, E. Boros, K. Makino, E. Malizia, P. Ross- 
manith and H. Vollmer for help with technical questions and references. 
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